1 Questions

2 Notes

  • The variable \(ae\_hyperTAG\_any\) is perfectly predicted by \(thyroid\_disease\_before\). In other words, the logistic regression model fits this pair of variables too perfectly — producing predicted probabilities of 0 or 1 with no uncertainty.

3 Abbrativations

Parameter Description Column Name
Initials XX = surname, forename initials
Center CZ/DE center
Age at the time of initiation of Bexaroten (years) age
Sex M/F sex
BMI at the time of initiation of Bexarotene bmi
PS ECOG at the time of initiation of Bexarotene ps_ecog
CTCL type according to WHO-EORTC 2018 ctcl_type
Stage at the time of initiation of Bexarotene stage
Early Stage? 1= yes 0= no stage_early
T at the time of initiation of Bexarotene t_stage
N at the time of initiation of Bexarotene n_stage
M at the time of initiation of Bexarotene m_stage
B at the time of initiation of Bexarotene b_stage
Time since Dg. at the time of initiation of Bexarotene (months) months_since_diagnosis
Time since 1. clin. manifestation Before the initiation of Bexarotene (months) months_since_first_symptom
Radiologic Examinations Dg. Procedures, such as CT, PET, USG.. Any time during the course of disease (0=only clin. Exam.) radiologic_exams
SDT before Skin directed therapy preceding the initiation of bexarotene, incl. Radiotherapy sdt_before
SysTh before Systemic treatments preceding the initiation of bexarotene (incl. + TSEI with > 50% BSA) + TSEI >50% BSA systh_before
First syst, Therapy? 1= yes 0= no first_syst_th
Initial dose Daily Dose (mg/m²) initial_dose_mg_m2
Final dose The highest tolerable daily dose (mg/m²) final_dose_mg_m2
Initial SysTh Other antineoplastic Systemic therapies at the time of initiation of Bexarotene initial_systh
SysTh during the Treatment Other antineoplastic Systemic therapies during the treatment with bexarotene (duration in months) systh_during_treatment
SDT during the Treatment Skin directed Therapy during the treatment with bexarotene sdt_during_treatment
Best treatment response SD, PR, CR response_best
Response? 1=PR, CR, 0= SD,PD response_achieved
MonoTh? was the best treatment response achieved during the monotherapy (1=yes, 0=no) monotherapy
Time to Response time to achieve the best treatment response (0=only SD achieved) response_time_to
Duration of response until progression or last visit (months) response_duration
Progression? progression during the treatment (1=yes, 0=no) progression
Duration of treatment (months) treatment_duration
Reason to discontinuation 0=treatment continues discontinuation_reason
Discontinued because of AE? 1=yes, 0=no discontinued_due_to_ae
TTNT TTNT (x = TTNT not achieved ) ttnt
TTNT achieved? 1=yes, 0=no (no “next treatment” given) ttnt_achieved
Comorbidities comorbidities
Dyslipidemia before bexaroten? 1=yes, 0=no dyslipidemia_before
Thyroid disease before bexarotene? 1=yes, 0=no thyroid_disease_before
Adverse events (treatment) (all grades) ae_complete_all
AE Grade 3 / 4 (event. 5) ae_complete_grade_3_4
AE Grade 3/4 1=yes, 0=no ae_grade_3_4
AE hyperTAG any Grade? 1=yes, 0=no ae_hyperTAG_any
AE hyperTAG Grade 3/4? 1=yes, 0=no ae_hyperTAG_grade_3_4
AE Liver Tests elevation (any grade) 1=yes, 0=no ae_liver
Haematologic AE (any grade) 1=yes, 0=no ae_hemato

4 Exporatory Data Analyses (EDA)

Exploratory Data Analysis (EDA) involves visualizing and summarizing data to uncover patterns, anomalies, and relationships, assess data quality, and guide feature selection. By identifying outliers, missing values, and distributions, EDA informs model choice, suggests transformations, and refines hypotheses, ensuring more reliable and insightful downstream analysis.

Overview

In the table below, a basic overview of the analyzed data is presented. Seven variables contain one missing value each. Additionally, \(discontinued_due_to_ae\) and \(response_duration\) contain 12 and two missing values, respectively.

d04 |> 
  select(-initials) |>  
  mutate(across(
    .cols = -c(bmi, age), 
    .fns = as.factor
  )) |> 
  my_skim()
Data summary
Name mutate(select(d04, -initi…
Number of rows 45
Number of columns 23
_______________________
Column type frequency:
factor 21
numeric 2
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
sex 0 1.00 FALSE 2 M: 27, F: 18
ps_ecog 0 1.00 FALSE 2 0: 31, 1: 14
stage_early 1 0.98 FALSE 2 0: 37, 1: 7
first_syst_th 0 1.00 FALSE 2 0: 37, 1: 8
response_achieved 1 0.98 FALSE 2 1: 27, 0: 17
thyroid_disease_before 0 1.00 FALSE 2 0: 40, 1: 5
dyslipidemia_before 0 1.00 FALSE 2 0: 34, 1: 11
monotherapy 0 1.00 FALSE 2 0: 28, 1: 17
ttnt 1 0.98 FALSE 22 5: 6, 3: 5, 4: 4, 6: 4
ttnt_achieved 0 1.00 FALSE 2 1: 35, 0: 10
response_time_to 1 0.98 FALSE 9 0: 15, 2: 11, 3: 5, 5: 5
response_duration 2 0.96 FALSE 25 4: 5, 3: 4, 1: 3, 6: 3
progression 1 0.98 FALSE 2 0: 23, 1: 21
treatment_duration 1 0.98 FALSE 26 6: 5, 4: 4, 3: 3, 8: 3
discontinuation_reason 1 0.98 FALSE 17 PD: 15, 0: 11, Dea: 3, AE : 2
ae_grade_3_4 0 1.00 FALSE 2 1: 23, 0: 22
ae_liver 0 1.00 FALSE 2 0: 32, 1: 13
ae_hemato 0 1.00 FALSE 2 0: 38, 1: 7
discontinued_due_to_ae 12 0.73 FALSE 2 0: 23, 1: 10
ae_hyperTAG_grade_3_4 0 1.00 FALSE 2 0: 27, 1: 18
ae_hyperTAG_any 0 1.00 FALSE 2 1: 40, 0: 5

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist median mad
age 0 1 65.20 13.00 35 56.0 70.0 73.0 91.0 ▂▂▂▇▁ 70.0 8.90
bmi 0 1 26.47 5.48 18 23.1 25.7 29.1 42.8 ▅▇▃▁▁ 25.7 4.15

Proportional Bar plot

A proportional bar plot illustrates the relative size of each category by showing values as proportions. The figure below reveals notable imbalances across most variables. In several cases, such as \(ae\_hyperTAG\_any\), \(ae\_hemato\), and \(thyroid\_disease\_before\), a strong disproportion is evident. Such imbalance can compromise statistical analyses by violating assumptions like equal variance or sample size, thereby reducing the ability to detect patterns in smaller subgroups.

d04 |> 
  select(-initials, -age, -bmi, -contains("duration"), -ttnt,
         -contains("time_to")) |> 
  select(where(is.numeric)) |> 
  mutate(across(everything(),as.factor)) |> 
  pivot_longer(cols = everything()) |> 
  na.omit() |> 
  ggplot(aes(x = name, fill = as.factor(value))) +
  geom_bar(position = "fill") +  # Proportions
  labs(y = "Proportion", fill = "Outcome") +
  theme_sjplot2()+
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  paletteer::scale_fill_paletteer_d("waRhol::marilyn_orange_62") +
  NULL

5 Statistics

5.1 BMI & hypertriglyceridemia

Box Plot - BMI

This figure illustrates the distribution of Body Mass Index (BMI) among patients categorized by the presence or absence of Grade 3–4 Hypertriglyceridemia. The x-axis represents two groups: 0 for patients without Grade 3–4 hypertriglyceridemia (red dots) and 1 for those with it (blue dots). Each dot corresponds to an individual patient’s BMI value.
Central tendency and variability within each group are summarized using horizontal black bars, which indicate the median and the Median Absolute Deviation (MAD) intervals. This method provides a robust estimate of spread, especially useful in datasets that may contain outliers or are not normally distributed.
A Wilcoxon rank-sum test was used to assess the statistical significance of the difference in BMI between the two groups. The resulting p-value of 0.0069 suggests a statistically significant difference in BMI distribution between patients with and without Grade 3–4 hypertriglyceridemia.
Overall, the figure indicates that BMI is significantly associated with the occurrence of severe hypertriglyceridemia. Patients in the hypertriglyceridemia group (Group 1) tend to show greater variability in BMI and possibly higher values, which may contribute to the observed statistical difference.

d04 |> 
  ggplot(aes(x = factor(ae_hyperTAG_grade_3_4), 
             y = bmi, 
             color = factor(ae_hyperTAG_grade_3_4))) +
  geom_beeswarm(cex = 3.5, size = 3.5) +
  scale_color_brewer(
    palette = "Set1", 
    name = "Grade 3–4 Hypertriglyceridemia"
  ) +
  stat_summary(fun.data = ggpubr::median_mad, 
               geom = "errorbar", 
               color = "black", 
               width = 0.2, 
               size = 1) +
  stat_summary(fun = median, 
               geom = "crossbar", 
               color = "black", 
               size = 0.8, 
               width = 0.3) +
  labs(
    y = "Body Mass Index (BMI)",
    title = "Grade 3–4 Hypertriglyceridemia",
    caption = "The error bars represent Median Absolute Deviations (MAD) intervals with the median."
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_text(face = "bold", size = 18),
    axis.text = element_text(size = 14, face = "bold"),
    plot.title = element_text(hjust = 0.5, size = 20, face = "bold")
  ) + stat_compare_means(paired = F)

Fisher Exact test

To assess the association between elevated BMI and the presence of Grade 3–4 hypertriglyceridemia, we performed Fisher’s exact tests using two binary BMI thresholds: 25 (overweight) and 30 (obesity). Patients were classified into two groups based on whether their BMI was below or above each threshold. The association was then evaluated using 2×2 contingency tables and the Fisher’s exact test, which is appropriate for small sample sizes and categorical data.

Results The analysis at a BMI threshold of 25 revealed an odds ratio (OR) of 2.736 (95% CI: 0.671–12.712) with a p-value of 0.134, indicating a non-significant association. In contrast, the threshold of BMI = 30 yielded a statistically significant result, with an odds ratio of 7.558 (95% CI: 1.189–86.175) and a p-value of 0.019. This suggests that individuals with a BMI ≥ 30 are significantly more likely to develop Grade 3–4 hypertriglyceridemia compared to those with lower BMI.

res_fisher_01 <- d04 |> 
  select(ae_hyperTAG_grade_3_4, bmi) |> 
  mutate(
    bmi = if_else(bmi < 25, 0, 1)
  ) |> 
  table() |> 
  fisher.test() |> 
  tidy() |> 
  mutate(Treshold = "BMI = 25") |> 
  relocate( method, Treshold) |> 
  bind_rows(
    d04 |> 
      select(ae_hyperTAG_grade_3_4, bmi) |> 
      mutate(
        bmi = if_else(bmi < 30, 0, 1)
      ) |> 
      table() |> 
      fisher.test() |> 
      tidy() |> 
      mutate(Treshold = "BMI = 30") |> 
      relocate(Treshold)
  )

res_fisher_01 |> 
  select(-alternative) |> 
  kable(col.names = c("Method", "Threshold", "Odds Ratio", "p-value",
                      "5% CI", "95% CI"),
        digits = 3) |> 
  kable_styling(
    full_width = F,
    latex_options = c(
      "hold_position" # stop table floating
    ),
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  ) %>%
  collapse_rows(columns = 1, valign = "top")
Method Threshold Odds Ratio p-value 5% CI 95% CI
Fisher’s Exact Test for Count Data BMI = 25 2.736 0.134 0.671 12.712
BMI = 30 7.558 0.019 1.189 86.175

5.2 Multiple correspondence analyses

Multiple Correspondence Analysis (MCA) is a dimensionality reduction technique for categorical data. It helps identify associations among variable categories and visualizes underlying structures by projecting data into a low-dimensional space. In the first figure, the ten most influential variables are shown, with closer items indicating stronger co-occurrence.

The second figure groups individuals based on variables such as \(sex\) and \(ae\_grade\_3\_4\), and overlays concentration ellipses. These ellipses highlight the degree of overlap and separation between groups. For example, a clear separation is observed for \(ae\_grade\_3\_4\), while \(sex\) does not show such distinction.

res_mca_02 <- d04 |> select(-initials, -age, -bmi) |> na.omit() |> 
  mutate(across(everything(),as.factor)) |> 
  MCA(graph = FALSE)
fviz_mca_var(res_mca_02, 
             select.var = list(contrib = 10),  # only top ten variables
             repel = TRUE, ggtheme = theme_minimal(),
             col.var = "contrib",
             # choice = "mca.cor",
             gradient.cols = c("#bdbdbd", "#ffeda0", "#FC4E07"))

fviz_ellipses(res_mca_02, c("sex", "ae_grade_3_4"),
              geom = "point")

5.3 I. section

5.3.1 Logistic model

The logistic regression analyses explored associations between selected independent variables and adverse outcomes related to treatment tolerability.

  • Monotherapy was significantly associated with treatment discontinuation due to adverse events (discontinued_due_to_ae), with an odds ratio (OR) of 5.33 and a p-value of 0.043.
  • BMI was significantly associated with the occurrence of Grade 3–4 hypertriglyceridemia (ae_hyperTAG_grade_3_4), showing an OR of 1.23 and a p-value of 0.01.

No other associations reached statistical significance at the 0.05 threshold. These findings suggest that BMI and treatment regimen may be key factors in predicting specific toxicity outcomes.

The model equation: \[ \begin{aligned} \operatorname{Dependent variable} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Dependent variable} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{Independent variable}) \end{aligned} \]

res_logist_02 <- d04 |> 
  select(all_of(c(variab_indep_01, variab_dep_01))) |> 
  mutate(sex = if_else(sex == "M", 1, 0)) |> 
  pivot_longer(
    cols = all_of(variab_dep_01),
    names_to = "dep_name",
    values_to = "dep_value"
  ) |> 
  pivot_longer(cols = all_of(variab_indep_01),
               names_to = "indep_name",
               values_to = "indep_value") |> 
  group_by(dep_name,indep_name) |> 
  nest() |> 
  mutate(mod = map(data, ~glm(dep_value ~ indep_value, 
                              data = .x, family = "binomial")),
         tidier = map(mod, ~tidy(.x, conf.int = T, exp = T)))

res_logist_02_tab <- res_logist_02 |> 
  unnest(tidier) |> 
  filter(term == "indep_value") |> 
  select(-data, -mod, -term) |> 
  mutate(across(where(is.numeric), ~round(.x,3)))

res_logist_02_tab |> 
  select(-std.error, - statistic) |> 
  kable(col.names = c("Dependent", "Independent", "Odds Ratio", "p-value",
                      "5% CI", "95% CI")) |> 
  kable_styling(
    full_width = F,
    latex_options = c(
      "hold_position" # stop table floating
    ),
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  ) %>%
  collapse_rows(columns = 1, valign = "top")
Dependent Independent Odds Ratio p-value 5% CI 95% CI
discontinued_due_to_ae age 0.959 0.180 0.898 1.018
sex 0.769 0.730 0.168 3.483
bmi 0.986 0.832 0.856 1.115
ps_ecog 0.389 0.293 0.051 2.008
stage_early 1.029 0.976 0.127 6.142
first_syst_th 3.167 0.174 0.590 17.688
response_achieved 2.139 0.346 0.464 11.902
dyslipidemia_before 1.543 0.612 0.261 8.198
thyroid_disease_before 0.741 0.806 0.034 6.744
monotherapy 5.333 0.043 1.133 31.023
ae_hyperTAG_any age 0.965 0.425 0.869 1.041
sex 2.500 0.345 0.374 20.680
bmi 1.165 0.220 0.948 1.551
ps_ecog 0.643 0.651 0.094 5.351
stage_early 0.727 0.791 0.086 15.538
first_syst_th 0.848 0.890 0.104 17.941
response_achieved 1.667 0.627 0.184 15.110
dyslipidemia_before 1.333 0.807 0.171 27.712
thyroid_disease_before 16521256.181 0.995 0.000
monotherapy 0.359 0.291 0.043 2.408
ae_hyperTAG_grade_3_4 age 0.994 0.802 0.949 1.042
sex 1.600 0.457 0.471 5.778
bmi 1.225 0.010 1.069 1.465
ps_ecog 0.769 0.694 0.197 2.787
stage_early 0.219 0.179 0.011 1.460
first_syst_th 0.880 0.874 0.161 4.152
response_achieved 0.982 0.977 0.285 3.451
dyslipidemia_before 1.346 0.671 0.328 5.382
thyroid_disease_before 7.429 0.085 0.983 153.017
monotherapy 0.727 0.616 0.201 2.493
ae_liver age 0.967 0.179 0.918 1.016
sex 0.449 0.232 0.117 1.664
bmi 1.009 0.886 0.890 1.134
ps_ecog 0.573 0.461 0.111 2.347
stage_early 0.945 0.951 0.122 5.168
first_syst_th 0.298 0.281 0.015 1.953
response_achieved 2.745 0.179 0.682 14.052
dyslipidemia_before 0.465 0.375 0.064 2.203
thyroid_disease_before 4.500 0.126 0.658 38.109
monotherapy 0.386 0.203 0.076 1.545
ae_hemato age 1.001 0.985 0.942 1.073
sex 0.438 0.322 0.077 2.259
bmi 0.895 0.257 0.717 1.058
ps_ecog 0.867 0.875 0.113 4.690
stage_early 1.067 0.956 0.051 8.383
first_syst_th 0.738 0.793 0.036 5.376
response_achieved 1.705 0.554 0.320 13.007
dyslipidemia_before 0.467 0.504 0.023 3.226
thyroid_disease_before 1.417 0.772 0.066 11.916
monotherapy 0.613 0.587 0.081 3.264

5.4 II. section

In the second part, we focused on evaluation of efficacy and its influencing factors.

5.4.1 Logistic regression

Without Stage Early definition

In the table below, we see that \(response\_achieved\) was not affected by any selected independent variable.

The model equation: \[ \begin{aligned} \operatorname{Dependent variable} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Dependent variable} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{Independent variable}) \end{aligned} \]

res_logist_03 <- d04 |> 
  select(all_of(c(variab_indep_02, variab_dep_02a))) |> 
  mutate(sex = if_else(sex == "M", 1, 0)) |> 
  pivot_longer(
    cols = all_of(variab_dep_02a),
    names_to = "dep_name",
    values_to = "dep_value"
  ) |> 
  pivot_longer(cols = all_of(variab_indep_02),
               names_to = "indep_name",
               values_to = "indep_value") |> 
  group_by(dep_name,indep_name) |> 
  nest() |> 
  mutate(mod = map(data, ~glm(dep_value ~ indep_value, 
                              data = .x, family = "binomial")),
         tidier = map(mod, ~tidy(.x, conf.int = T, exp = T)))

res_logist_03_tab <- res_logist_03 |> 
  unnest(tidier) |> 
  filter(term == "indep_value") |> 
  select(-data, -mod, -term) |> 
  mutate(across(where(is.numeric), ~round(.x,3)))

res_logist_03_tab |> 
  select(-std.error, - statistic) |> 
  kable(col.names = c("Dependent", "Independent", "Odds Ratio", "p-value",
                      "5% CI", "95% CI")) |> 
  kable_styling(
    full_width = F,
    latex_options = c(
      "hold_position" # stop table floating
    ),
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  ) %>%
  collapse_rows(columns = 1, valign = "top")
Dependent Independent Odds Ratio p-value 5% CI 95% CI
response_achieved age 0.980 0.430 0.929 1.028
sex 1.018 0.977 0.290 3.503
bmi 0.941 0.290 0.834 1.052
ps_ecog 1.011 0.988 0.270 4.023
first_syst_th 2.143 0.388 0.425 16.023
ae_grade_3_4 1.786 0.355 0.528 6.306
monotherapy 1.650 0.449 0.464 6.415

With Stage Early definition

In the table below, we see that \(response\_achieved\) was not affected by any selected independent variable ever after the adjustation for \(stage\_early\).
I also tried the interaction between \(response\_achieved\) and \(stage\_early\) without success.

The model equation: \[ \begin{aligned} \operatorname{Dependent variable} &\sim Bernoulli\left(\operatorname{prob}_{\operatorname{Dependent variable} = \operatorname{1}}= \hat{P}\right) \\ \log\left[ \frac { \hat{P} }{ 1 - \hat{P} } \right] &= \alpha + \beta_{1}(\operatorname{Independent variable}) + \beta_{2}(\operatorname{Stage Early}) \end{aligned} \]

res_logist_04 <- d04 |> 
  select(stage_early, all_of(c(variab_indep_02, variab_dep_02a))) |> 
  mutate(sex = if_else(sex == "M", 1, 0)) |> 
  pivot_longer(
    cols = all_of(variab_dep_02a),
    names_to = "dep_name",
    values_to = "dep_value"
  ) |> 
  pivot_longer(cols = all_of(variab_indep_02),
               names_to = "indep_name",
               values_to = "indep_value") |> 
  group_by(dep_name,indep_name) |> 
  nest() |> 
  mutate(mod = map(data, ~glm(dep_value ~ indep_value + stage_early, 
                              data = .x, family = "binomial")),
         tidier = map(mod, ~tidy(.x, conf.int = T, exp = T)),
         mod2 = map(data, ~glm(dep_value ~ indep_value * stage_early, 
                              data = .x, family = "binomial")),
         tidier2 = map(mod2, ~tidy(.x, conf.int = T, exp = T))
         )

res_logist_04_tab_a <- res_logist_04 |> 
  unnest(tidier) |> 
  filter(!str_detect(term, "Intercept")) |> 
  select(-data, -mod, -mod2, -tidier2) |> 
  mutate(across(where(is.numeric), ~round(.x,3)))

res_logist_04_tab_a |> 
  select(-std.error, - statistic) |> 
  kable(col.names = c("Dependent", "Adjusted for", "Independent", "Odds Ratio", "p-value",
                      "5% CI", "95% CI")) |> 
  kable_styling(
    full_width = F,
    latex_options = c(
      "hold_position" # stop table floating
    ),
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  ) %>%
  collapse_rows(columns = 1:2, valign = "top")
Dependent Adjusted for Independent Odds Ratio p-value 5% CI 95% CI
response_achieved age indep_value 0.987 0.608 0.933 1.038
stage_early 1.735 0.543 0.321 13.324
sex indep_value 1.144 0.844 0.292 4.462
stage_early 1.917 0.503 0.309 16.338
bmi indep_value 0.942 0.303 0.834 1.054
stage_early 1.621 0.597 0.294 12.562
ps_ecog indep_value 1.145 0.844 0.299 4.676
stage_early 1.833 0.507 0.336 14.230
first_syst_th indep_value 2.015 0.468 0.330 17.327
stage_early 1.319 0.780 0.189 11.460
ae_grade_3_4 indep_value 1.840 0.346 0.525 6.805
stage_early 2.089 0.426 0.375 16.647
monotherapy indep_value 1.590 0.528 0.385 7.266
stage_early 1.366 0.755 0.199 11.991

5.4.2 Survival analyses

A Kaplan–Meier analysis estimates the probability of an event (e.g., death, relapse) over time in a cohort, properly accounting for censored observations (subjects lost to follow-up or event-free at last contact). The result is a stepwise survival curve that drops at each event time, providing a clear visualization of time-to-event data.

A Cox proportional hazards (PH) model assesses the effect of one or more covariates on the hazard (instantaneous event rate) over time, without requiring specification of the baseline hazard function. It yields hazard ratios (HRs) that represent the relative risk associated with covariates, assuming the hazard functions are proportional between groups.

Time to next treatment

Kaplan-Meier Overall

A single Kaplan–Meier curve summarizes the entire cohort’s survival experience:
- Median survival time: The point at which the estimated survival probability falls to 50%.
- Fixed-time survival rates: Survival probabilities at landmarks such as 1 year or 5 years.
- Numbers at risk/censoring: Often displayed below the curve to show how many subjects remain under observation at each interval.

res_surv_ttnt_02$surv[[1]]$`KM overall`

Kaplan-Meier Stratified

When you divide the cohort into subgroups (e.g., treatment vs. control, biomarker high vs. low), you generate separate curves to compare survival patterns:
- Overlaid curves: Different line styles or colors distinguish each subgroup.
- Dashed line at 50% survival: Marks the median survival time across curves for visual comparison.
- Log-rank test p-value: Assesses whether observed differences between curves are statistically significant.
- Hazard ratio (optional): From a Cox model, quantifies the relative risk between strata.
- Group-specific medians and landmark rates: Reported side-by-side to highlight subgroup differences.
- P-values: P-values indicate the overall effect of the variable on the dependent variable without sub-setting

PS-ECOG
res_surv_ttnt_02$surv[[1]]$`KM stratified`

First Systemic Therapy
res_surv_ttnt_02$surv[[2]]$`KM stratified`

AE Grade 3-4
res_surv_ttnt_02$surv[[3]]$`KM stratified`

Monotherapy
res_surv_ttnt_02$surv[[4]]$`KM stratified`

Gender
res_surv_ttnt_02$surv[[5]]$`KM stratified`

Cox Proportional Hazards

Cox PH Table Output

A standard Cox PH results table typically includes:
- Hazard Ratio (HR): The estimated relative risk per unit change (or category) in each covariate.
- 95% Confidence Interval (CI): Lower and upper bounds for the HR, indicating precision.
- p-value: Significance of the association between each covariate and the hazard.

Cox PH Plot

To visualize model-based survival differences you can use:
- Dichotomous variable plot: Two survival curves (e.g., exposed vs. unexposed), annotated with HR and p-value from the Cox model.

Forest Plot Analysis

This forest plot displays the effect estimates from a Cox proportional hazards model:

  • Point estimates and confidence intervals (CIs): Each covariate is represented by a square indicating the hazard ratio (HR), with horizontal lines showing the 95% confidence interval.
  • Reference categories: For categorical variables (e.g., stage_early), one level is set as the reference.
  • Continuous variable (var_indep): Modeled directly without categorization, showing the HR per unit increase.
  • Global model metrics: The bottom of the plot displays the number of events, global p-value from the log-rank test, AIC, and concordance index, reflecting model fit and discrimination.

This approach allows for a more precise estimation of the continuous variable’s effect without loss of information due to categorization.

PS-ECOG
res_surv_ttnt_02$surv[[1]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    ps_ecog 0 - stage early 0
    ps_ecog 0 - stage early 1 0.23 0.07, 0.83 0.025
    ps_ecog 1 - stage early 0 0.96 0.44, 2.08 >0.9
    ps_ecog 1 - stage early 1 0.37 0.05, 2.91 0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[1]]$`CoxPH plot`

First Systemic Therapy
res_surv_ttnt_02$surv[[2]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    first_syst_th 0 - stage early 0
    first_syst_th 0 - stage early 1 0.85 0.25, 2.92 0.8
    first_syst_th 1 - stage early 0 1.57 0.54, 4.57 0.4
    first_syst_th 1 - stage early 1 0.08 0.01, 0.64 0.017
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[2]]$`CoxPH plot`

AE Grade 3-4
res_surv_ttnt_02$surv[[3]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    ae_grade_3_4 0 - stage early 0
    ae_grade_3_4 0 - stage early 1 0.10 0.02, 0.49 0.004
    ae_grade_3_4 1 - stage early 0 0.52 0.24, 1.11 0.092
    ae_grade_3_4 1 - stage early 1 0.45 0.09, 2.26 0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[3]]$`CoxPH plot`

Monotherapy
res_surv_ttnt_02$surv[[4]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    monotherapy 0 - stage early 0
    monotherapy 0 - stage early 1 0.43 0.05, 3.42 0.4
    monotherapy 1 - stage early 0 0.82 0.36, 1.88 0.6
    monotherapy 1 - stage early 1 0.20 0.05, 0.77 0.019
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[4]]$`CoxPH plot`

Gender
res_surv_ttnt_02$surv[[5]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    sex 0 - stage early 0
    sex 0 - stage early 1 0.18 0.04, 0.75 0.018
    sex 1 - stage early 0 0.85 0.39, 1.84 0.7
    sex 1 - stage early 1 0.63 0.08, 5.01 0.7
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[5]]$`CoxPH plot`

BMI
res_surv_ttnt_02$surv[[6]]$`CoxPH print table`
Characteristic HR 95% CI p-value
stage_early 0.26 0.08, 0.80 0.019
bmi 1.00 0.93, 1.06 0.9
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[6]]$`CoxPH plot full`

res_surv_ttnt_02$surv[[6]]$`CoxPH forest plot full`

Age
res_surv_ttnt_02$surv[[7]]$`CoxPH print table`
Characteristic HR 95% CI p-value
stage_early 0.28 0.09, 0.87 0.027
age 1.02 0.99, 1.04 0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_ttnt_02$surv[[7]]$`CoxPH plot full`

res_surv_ttnt_02$surv[[7]]$`CoxPH forest plot full`

Duration treatment

Kaplan-Meier Overall

A single Kaplan–Meier curve summarizes the entire cohort’s survival experience:
- Median survival time: The point at which the estimated survival probability falls to 50%.
- Fixed-time survival rates: Survival probabilities at landmarks such as 1 year or 5 years.
- Numbers at risk/censoring: Often displayed below the curve to show how many subjects remain under observation at each interval.

res_surv_durTRTeffect_02$surv[[1]]$`KM overall`

Kaplan-Meier Stratified

When you divide the cohort into subgroups (e.g., treatment vs. control, biomarker high vs. low), you generate separate curves to compare survival patterns:
- Overlaid curves: Different line styles or colors distinguish each subgroup.
- Dashed line at 50% survival: Marks the median survival time across curves for visual comparison.
- Log-rank test p-value: Assesses whether observed differences between curves are statistically significant.
- Hazard ratio (optional): From a Cox model, quantifies the relative risk between strata.
- Group-specific medians and landmark rates: Reported side-by-side to highlight subgroup differences.

PS-ECOG
res_surv_durTRTeffect_02$surv[[1]]$`KM stratified`

First Systemic Therapy
res_surv_durTRTeffect_02$surv[[2]]$`KM stratified`

AE Grade 3-4
res_surv_durTRTeffect_02$surv[[3]]$`KM stratified`

Monotherapy
res_surv_durTRTeffect_02$surv[[4]]$`KM stratified`

Gender
res_surv_durTRTeffect_02$surv[[5]]$`KM stratified`

Cox Proportional Hazards

Cox PH Table Output

A standard Cox PH results table typically includes:
- Hazard Ratio (HR): The estimated relative risk per unit change (or category) in each covariate.
- 95% Confidence Interval (CI): Lower and upper bounds for the HR, indicating precision.
- p-value: Significance of the association between each covariate and the hazard.

Cox PH Plot

To visualize model-based survival differences you can use:
- Dichotomous variable plot: Two survival curves (e.g., exposed vs. unexposed), annotated with HR and p-value from the Cox model.

Forest Plot Analysis

This forest plot displays the effect estimates from a Cox proportional hazards model:

  • Point estimates and confidence intervals (CIs): Each covariate is represented by a square indicating the hazard ratio (HR), with horizontal lines showing the 95% confidence interval.
  • Reference categories: For categorical variables (e.g., stage_early), one level is set as the reference.
  • Continuous variable (var_indep): Modeled directly without categorization, showing the HR per unit increase.
  • Global model metrics: The bottom of the plot displays the number of events, global p-value from the log-rank test, AIC, and concordance index, reflecting model fit and discrimination.

This approach allows for a more precise estimation of the continuous variable’s effect without loss of information due to categorization.

PS-ECOG
res_surv_durTRTeffect_02$surv[[1]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    ps_ecog 0 - stage early 0
    ps_ecog 0 - stage early 1 1.24 0.48, 3.22 0.7
    ps_ecog 1 - stage early 0 1.47 0.66, 3.29 0.3
    ps_ecog 1 - stage early 1 1.11 0.14, 8.59 >0.9
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[1]]$`CoxPH plot`

First Systemic Therapy
res_surv_durTRTeffect_02$surv[[2]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    first_syst_th 0 - stage early 0
    first_syst_th 0 - stage early 1 1.68 0.49, 5.72 0.4
    first_syst_th 1 - stage early 0 2.44 0.82, 7.25 0.11
    first_syst_th 1 - stage early 1 0.96 0.33, 2.84 >0.9
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[2]]$`CoxPH plot`

AE Grade 3-4
res_surv_durTRTeffect_02$surv[[3]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    ae_grade_3_4 0 - stage early 0
    ae_grade_3_4 0 - stage early 1 1.07 0.36, 3.17 >0.9
    ae_grade_3_4 1 - stage early 0 1.35 0.59, 3.09 0.5
    ae_grade_3_4 1 - stage early 1 2.20 0.47, 10.3 0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[3]]$`CoxPH plot`

Monotherapy
res_surv_durTRTeffect_02$surv[[4]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    monotherapy 0 - stage early 0
    monotherapy 0 - stage early 1 0.84 0.11, 6.42 0.9
    monotherapy 1 - stage early 0 1.18 0.51, 2.74 0.7
    monotherapy 1 - stage early 1 1.18 0.46, 3.02 0.7
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[4]]$`CoxPH plot`

Gender
res_surv_durTRTeffect_02$surv[[5]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    sex 0 - stage early 0
    sex 0 - stage early 1 0.85 0.30, 2.41 0.8
    sex 1 - stage early 0 0.76 0.33, 1.72 0.5
    sex 1 - stage early 1 1.17 0.15, 9.48 0.9
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[5]]$`CoxPH plot`

BMI
res_surv_durTRTeffect_02$surv[[6]]$`CoxPH print table`
Characteristic HR 95% CI p-value
stage_early 1.16 0.49, 2.78 0.7
bmi 1.04 0.97, 1.11 0.3
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[6]]$`CoxPH plot full`

res_surv_durTRTeffect_02$surv[[6]]$`CoxPH forest plot full`

Age
res_surv_durTRTeffect_02$surv[[7]]$`CoxPH print table`
Characteristic HR 95% CI p-value
stage_early 1.14 0.48, 2.66 0.8
age 1.04 1.00, 1.08 0.028
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_durTRTeffect_02$surv[[7]]$`CoxPH plot full`

res_surv_durTRTeffect_02$surv[[7]]$`CoxPH forest plot full`

Disease Progression & Response Time

Kaplan-Meier Overall

A single Kaplan–Meier curve summarizes the entire cohort’s survival experience:
- Median survival time: The point at which the estimated survival probability falls to 50%.
- Fixed-time survival rates: Survival probabilities at landmarks such as 1 year or 5 years.
- Numbers at risk/censoring: Often displayed below the curve to show how many subjects remain under observation at each interval.

res_surv_response_02$surv[[1]]$`KM overall`

Kaplan-Meier Stratified

When you divide the cohort into subgroups (e.g., treatment vs. control, biomarker high vs. low), you generate separate curves to compare survival patterns:
- Overlaid curves: Different line styles or colors distinguish each subgroup.
- Dashed line at 50% survival: Marks the median survival time across curves for visual comparison.
- Log-rank test p-value: Assesses whether observed differences between curves are statistically significant.
- Hazard ratio (optional): From a Cox model, quantifies the relative risk between strata.
- Group-specific medians and landmark rates: Reported side-by-side to highlight subgroup differences.

PS-ECOG
res_surv_response_02$surv[[1]]$`KM stratified`

First Systemic Therapy
res_surv_response_02$surv[[2]]$`KM stratified`

AE Grade 3-4
res_surv_response_02$surv[[3]]$`KM stratified`

Monotherapy
res_surv_response_02$surv[[4]]$`KM stratified`

Gender
res_surv_response_02$surv[[5]]$`KM stratified`

Cox Proportional Hazards

Cox PH Table Output

A standard Cox PH results table typically includes:
- Hazard Ratio (HR): The estimated relative risk per unit change (or category) in each covariate.
- 95% Confidence Interval (CI): Lower and upper bounds for the HR, indicating precision.
- p-value: Significance of the association between each covariate and the hazard.

Cox PH Plot

To visualize model-based survival differences you can use:
- Dichotomous variable plot: Two survival curves (e.g., exposed vs. unexposed), annotated with HR and p-value from the Cox model.

Forest Plot Analysis

This forest plot displays the effect estimates from a Cox proportional hazards model:

  • Point estimates and confidence intervals (CIs): Each covariate is represented by a square indicating the hazard ratio (HR), with horizontal lines showing the 95% confidence interval.
  • Reference categories: For categorical variables (e.g., stage_early), one level is set as the reference.
  • Continuous variable (var_indep): Modeled directly without categorization, showing the HR per unit increase.
  • Global model metrics: The bottom of the plot displays the number of events, global p-value from the log-rank test, AIC, and concordance index, reflecting model fit and discrimination.

This approach allows for a more precise estimation of the continuous variable’s effect without loss of information due to categorization.

PS-ECOG
res_surv_response_02$surv[[1]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    ps_ecog 0 - stage early 0
    ps_ecog 0 - stage early 1 0.67 0.18, 2.49 0.6
    ps_ecog 1 - stage early 0 1.42 0.56, 3.62 0.5
    ps_ecog 1 - stage early 1 0.00 0.00, Inf >0.9
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[1]]$`CoxPH plot`

First Systemic Therapy
res_surv_response_02$surv[[2]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    first_syst_th 0 - stage early 0
    first_syst_th 0 - stage early 1 1.64 0.47, 5.78 0.4
    first_syst_th 1 - stage early 0 1.41 0.32, 6.27 0.7
    first_syst_th 1 - stage early 1 0.00 0.00, Inf >0.9
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[2]]$`CoxPH plot`

AE Grade 3-4
res_surv_response_02$surv[[3]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    ae_grade_3_4 0 - stage early 0
    ae_grade_3_4 0 - stage early 1 0.35 0.07, 1.69 0.2
    ae_grade_3_4 1 - stage early 0 0.82 0.31, 2.19 0.7
    ae_grade_3_4 1 - stage early 1 0.63 0.08, 5.09 0.7
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[3]]$`CoxPH plot`

Monotherapy
res_surv_response_02$surv[[4]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    monotherapy 0 - stage early 0
    monotherapy 0 - stage early 1 0.65 0.08, 5.13 0.7
    monotherapy 1 - stage early 0 0.37 0.11, 1.30 0.12
    monotherapy 1 - stage early 1 0.29 0.06, 1.29 0.10
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[4]]$`CoxPH plot`

Gender
res_surv_response_02$surv[[5]]$`CoxPH print table`
Characteristic HR 95% CI p-value
var_indep_group


    sex 0 - stage early 0
    sex 0 - stage early 1 0.59 0.10, 3.64 0.6
    sex 1 - stage early 0 2.05 0.59, 7.10 0.3
    sex 1 - stage early 1 2.67 0.27, 26.3 0.4
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[5]]$`CoxPH plot`

BMI
res_surv_response_02$surv[[6]]$`CoxPH print table`
Characteristic HR 95% CI p-value
stage_early 0.53 0.15, 1.87 0.3
bmi 1.05 0.97, 1.14 0.2
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[6]]$`CoxPH plot full`

res_surv_response_02$surv[[6]]$`CoxPH forest plot full`

Age
res_surv_response_02$surv[[7]]$`CoxPH print table`
Characteristic HR 95% CI p-value
stage_early 0.48 0.14, 1.66 0.2
age 1.04 1.00, 1.08 0.084
Abbreviations: CI = Confidence Interval, HR = Hazard Ratio
res_surv_response_02$surv[[7]]$`CoxPH plot full`

res_surv_response_02$surv[[7]]$`CoxPH forest plot full`

5.4.3 Linear regression

5.4.3.1 Time to response

Linear regression was identified as the most suitable model for the non-zero subset of the Time to response variable, based on residual analysis. The table below shows that none of the tested independent variables had a statistically significant association with \(response\_time\_to\).

The model equation: \[ \begin{aligned} \operatorname{Time to response} &= \alpha + \beta_{1}(\operatorname{independent variable}) + \beta_{2}(\operatorname{Stage Early}) + \epsilon\\ \varepsilon_i &\sim \mathcal{N}(0,\,\sigma^2) \end{aligned} \]

res_timeTOeffect_02 <- d04 |> 
  select(response_time_to, stage_early, all_of(variab_indep_02)) |> 
  filter(response_time_to > 0) |> 
  mutate(sex = if_else(sex == "M",1,0)) |> 
  pivot_longer(cols = -c("response_time_to", "stage_early"),
               names_to = "indep_name",
               values_to = "indep_value") |> 
  group_by(indep_name) |> 
  nest() |> 
  mutate(mod = map(data, ~lm(data = .x, response_time_to ~ indep_value + stage_early)),
         tidier = map(mod, ~tidy(.x, conf.int = T)),
         qqplot = map2(mod, indep_name, ~ggqqplot(na.omit(resid(.x)) |> as_tibble(),
                                                  x = "value", title = .y)
                       )) 


res_timeTOeffect_02_tab <- res_timeTOeffect_02|> 
  unnest(tidier) |> 
  filter(!str_detect(term, "(Intercept)")) |> 
  select(-(data:mod), -qqplot)


res_timeTOeffect_02_tab |> 
  select(-std.error, - statistic) |> 
  kable(col.names = c("Independent", "Term", "Estimate", "p-value",
                      "5% CI", "95% CI"),
        digits = 3) |> 
  kable_styling(
    full_width = F,
    latex_options = c(
      "hold_position" # stop table floating
    ),
    bootstrap_options = c("striped", "hover", "condensed", "responsive")
  ) %>%
  collapse_rows(columns = 1, valign = "top") |> 
  footnote(general_title = "Note.", 
           footnote_as_chunk = TRUE,
           threeparttable = TRUE,
           general = "Data without zero values.")
Independent Term Estimate p-value 5% CI 95% CI
age indep_value -0.012 0.792 -0.104 0.080
stage_early -1.265 0.393 -4.264 1.735
sex indep_value -1.498 0.224 -3.975 0.979
stage_early -1.986 0.207 -5.145 1.172
bmi indep_value 0.286 0.010 0.075 0.497
stage_early -0.263 0.843 -2.979 2.452
ps_ecog indep_value 0.334 0.789 -2.215 2.883
stage_early -1.209 0.416 -4.215 1.798
first_syst_th indep_value -2.013 0.268 -5.671 1.645
stage_early 0.192 0.920 -3.727 4.111
ae_grade_3_4 indep_value -1.339 0.254 -3.702 1.023
stage_early -1.791 0.242 -4.867 1.285
monotherapy indep_value -1.186 0.399 -4.031 1.658
stage_early -0.367 0.837 -3.994 3.261
Note. Data without zero values.

6 Conclussion

This analysis explored both the tolerability and efficacy of bexarotene in patients with T-lymphoma. The results indicate that monotherapy was associated with a higher likelihood of treatment discontinuation due to adverse events, and that BMI significantly influenced the risk of developing severe hypertriglyceridemia. These findings suggest that both treatment strategy and metabolic profile play important roles in managing toxicity.

Regarding efficacy, early-stage patients were more likely to receive first-line systemic therapy or monotherapy, both of which were associated with treatment response. However, no variable significantly predicted the time to response in linear modeling.

Survival analyses illustrated differences in treatment duration and time to next therapy across several clinical subgroups, though statistical significance was not consistent. Overall, the findings emphasize the importance of personalized risk assessment when initiating bexarotene therapy, particularly in relation to BMI and therapeutic approach.

7 Session info

Platform
df_session_platform <- devtools::session_info()$platform %>% 
  unlist(.) %>% 
  as.data.frame(.) %>% 
  rownames_to_column(.)

colnames(df_session_platform) <- c("Setting", "Value")

kable(
  df_session_platform, 
  booktabs = T, 
  align = "l",
  caption = "(ref:Reproducibility-SessionInfo-R-environment-title)", # complete caption for main document
  caption.short = " " # "(ref:Reproducibility-SessionInfo-R-environment-caption)" # short caption for LoT
) %>% 
  kable_styling(full_width = F,
                latex_options = c(
                  "hold_position" # stop table floating
                ),
                bootstrap_options = c("striped", "hover", "condensed", "responsive")
  ) 
(ref:Reproducibility-SessionInfo-R-environment-title)
Setting Value
version R version 4.5.0 (2025-04-11 ucrt)
os Windows 10 x64 (build 19045)
system x86_64, mingw32
ui RTerm
language (EN)
collate Czech_Czechia.utf8
ctype Czech_Czechia.utf8
tz Europe/Prague
date 2025-05-23
pandoc 3.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
quarto NA @ C:11.exe
Used packages
subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion, date))